#
Dr. M. Baron, Statistical Machine Learning class, STAT-427/627
# DECISION TREES
1. Classification trees
> library(tree)
Define a categorical variable ECO
> ECO = ifelse( mpg
> median(mpg), "Economy", "Consuming" )
> Cars = data.frame(
Auto, ECO ) # Include ECO into the data set
> table(ECO)
Consuming
Economy
196 196
# Here is the main tree command.
> tree( ECO ~ .-name, Cars )
1)
root 392 543.4 Consuming ( 0.5 0.5 )
2) mpg < 22.75 196 0.0 Consuming ( 1.0 0.0 ) *
3) mpg > 22.75 196 0.0 Economy ( 0.0 1.0 ) *
# Of course, classifying ECO based on mpg is trivial!!! The tree
picks this obvious split immediately.
# So, we’ll exclude mpg. We would like to predict ECO based on the
car’s technical characteristics.
> tree.eco = tree(
ECO ~ horsepower + weight + acceleration, data=Cars )
> tree.eco
node),
split, n, deviance, yval, (yprob)
* denotes
terminal node
1) root 392 543.400 Consuming ( 0.50000
0.50000 )
2) weight < 2764.5 191 123.700 Economy (
0.09948 0.90052 )
4) weight < 2224.5 98 11.160 Economy ( 0.01020 0.98980 )
8) horsepower < 87 92 0.000 Economy ( 0.00000 1.00000 ) *
9) horsepower > 87 6 5.407 Economy ( 0.16667 0.83333 ) *
5) weight > 2224.5 93 91.390 Economy ( 0.19355 0.80645 ) *
3) weight > 2764.5 201 147.000 Consuming
( 0.88060 0.11940 )
6) horsepower < 93.5 35 48.260 Consuming ( 0.54286 0.45714 )
12) weight < 2961 8 6.028 Economy ( 0.12500 0.87500 ) *
13) weight > 2961 27 34.370 Consuming ( 0.66667 0.33333 )
26) acceleration < 19.55 17 12.320 Consuming ( 0.88235 0.11765 )
52) weight < 3018 5 6.730 Consuming ( 0.60000 0.40000 ) *
53) weight > 3018 12 0.000 Consuming ( 1.00000 0.00000 ) *
27) acceleration > 19.55 10 12.220 Economy ( 0.30000 0.70000 )
54) weight < 3260 5 0.000 Economy ( 0.00000 1.00000 ) *
55) weight > 3260 5 6.730 Consuming ( 0.60000 0.40000 ) *
7) horsepower > 93.5 166 64.130 Consuming ( 0.95181 0.04819 )
14) weight < 2953.5 21 25.130 Consuming ( 0.71429 0.28571 )
28) acceleration < 14.45 5 5.004 Economy ( 0.20000 0.80000 ) *
29) acceleration > 14.45 16 12.060 Consuming ( 0.87500 0.12500 ) *
15) weight > 2953.5 145 21.110 Consuming ( 0.98621 0.01379 )
30) acceleration < 17.3 128 0.000 Consuming ( 1.00000 0.00000 ) *
31) acceleration > 17.3 17 12.320 Consuming ( 0.88235 0.11765 ) *
Example – the first internal node (the first
split). The tree splits all 392 cars in the root into 191 cars with weight <
2764.5 (node 2) and the other 201 cars with weight > 2764.5 (node 3). Among
the first group of 191 cars, the deviance is 123.7 (the smaller the better fit).
Without any additional data, the tree classifies these cars as “Economy”, the class that makes 90% of this group.
The other 201 cars with weight > 2764.5 are classified by the tree as “Consuming”, and in fact, 88% of them are “Consuming”.
Plot. This tree can be
visualized.
> plot(tree.fit, type="uniform") # The
default type is “proportional”, reflecting the misclassification rate
> text(tree.fit)
> summary(tree.eco)
Number of terminal nodes: 12
Residual mean deviance: 0.3833 = 145.7 / 380
Misclassification error rate: 0.07398 = 29 / 392
This misclassification rate is deceptive, because it is based on the
training data.
Cross-validation. Estimate the correct classification rate by cross-validation…
> n = length(ECO); Z = sample(n,n/2);
> tree.eco = tree(
ECO ~ horsepower + weight + acceleration, data=Cars[Z,] )
> ECO.predict =
predict( tree.eco, Cars, type="class" )
> table( ECO.predict[-Z], ECO[-Z] ) # Confusion matrix
Consuming Economy
Consuming 77
5
Economy 25 89
> mean( ECO.predict[-Z]
== ECO[-Z] )
[1] 0.8469388
That’s the classification rate, 84.7% of cars are classified
correctly by our tree.
Pruning.
There is built-in cross-validation to determine the optimal
complexity of a tree…
> cv=cv.tree( tree.eco )
> cv
$size # Number of terminal nodes
[1] 8 7 6 4 3 2 1
$dev # Deviance
[1] 163.8182 155.7427 121.3029 106.0917 121.2573
145.9874 273.9971$k
$k #
Complexity parameter
[1]
-Inf 2.866748 7.045125
7.727424 25.296234 34.517285 137.695468
Deviance is minimized by 4 nodes
> plot(cv)
# Optimize by the smallest mis-classification error instead of
deviance…
> cv=cv.tree( tree.eco, FUN = prune.misclass ) # By default (without specified FUN), results are sorted by deviance.
> cv # With “FUN=prune.misclass”,
the misclassification rate is the criterion.
$size
[1] 8 7 6 2 1
$dev # Now (despite the name) this is the number
of mis-classified units
[1] 26 26 28 25 94 # It is minimized by size=2
> plot(cv)
# We can now prune the tree to the optimal size…
> tree.eco.pruned =
prune.misclass( tree.eco,
best=2 )
> tree.eco.pruned
1) root 196 271.40 Economy ( 0.47959 0.52041
)
2)
weight < 3050.5 114 96.03 Economy (
0.14912 0.85088 ) *
3)
weight > 3050.5 82 37.66 Consuming (
0.93902 0.06098 ) *
2. Regression trees
> tree.mpg = tree( mpg ~ .-name-origin+as.factor(origin), Auto )
> tree.mpg
node), split, n, deviance, yval
* denotes
terminal node
1)
root 392 23820.0 23.45
2) displacement < 190.5 222 7786.0 28.64
4) horsepower < 70.5 71 1804.0 33.67
8) year < 77.5 28 280.2 29.75 *
9) year > 77.5 43 814.5 36.22 *
5) horsepower > 70.5 151 3348.0 26.28
10) year < 78.5 94 1222.0 24.12
20) weight < 2305 39 362.2 26.71 *
21) weight > 2305 55 413.7 22.29 *
11) year > 78.5 57 963.7 29.84
22) weight < 2580 24 294.2 33.12 *
23) weight > 2580 33 225.0 27.46 *
3) displacement > 190.5 170 2210.0 16.66
6) horsepower < 127 74 742.0 19.44 *
7) horsepower > 127 96 457.1 14.52 *
# Now, prediction for each class is the sample mean.
> plot(tree.mpg, type="uniform"); text(tree.mpg)
> summary(tree.mpg)
Regression tree: tree(formula = mpg ~ . - name - origin + as.factor(origin), data = Auto)
Variables actually used
in tree construction:
[1] "displacement"
"horsepower"
"year"
"weight"
Number of terminal nodes: 8
Residual mean deviance: 9.346 = 3589 / 384
Distribution of residuals:
Min.
1st Qu. Median Mean 3rd Qu. Max.
-9.4170 -1.5190 -0.2855 0.0000
1.7150 18.5600